Dataset statistics
| Number of variables | 24 |
|---|---|
| Number of observations | 365369 |
| Missing cells | 105505 |
| Missing cells (%) | 1.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 69.7 MiB |
| Average record size in memory | 200.0 B |
Variable types
| Numeric | 15 |
|---|---|
| Categorical | 9 |
zipcode is highly overall correlated with lng and 2 other fields | High correlation |
target_clean is highly overall correlated with log_target_clean and 3 other fields | High correlation |
log_target_clean is highly overall correlated with target_clean and 3 other fields | High correlation |
sqft_clean is highly overall correlated with target_clean and 3 other fields | High correlation |
log_sqft_clean is highly overall correlated with target_clean and 3 other fields | High correlation |
baths_clean is highly overall correlated with target_clean and 3 other fields | High correlation |
population is highly overall correlated with housing_units and 1 other fields | High correlation |
lat is highly overall correlated with state_zip and 1 other fields | High correlation |
lng is highly overall correlated with zipcode and 2 other fields | High correlation |
median_household_income is highly overall correlated with mean_rating | High correlation |
housing_units is highly overall correlated with population and 1 other fields | High correlation |
occupied_housing_units is highly overall correlated with population and 1 other fields | High correlation |
mean_rating is highly overall correlated with median_household_income | High correlation |
state_zip is highly overall correlated with zipcode and 3 other fields | High correlation |
major_city_ch is highly overall correlated with zipcode and 3 other fields | High correlation |
mlsid_join_bool is highly imbalanced (51.4%) | Imbalance |
status_cl is highly imbalanced (68.4%) | Imbalance |
baths_clean has 102041 (27.9%) missing values | Missing |
target_clean is highly skewed (γ1 = 25.64772097) | Skewed |
sqft_clean is highly skewed (γ1 = 150.3844613) | Skewed |
sqft_clean has 48935 (13.4%) zeros | Zeros |
log_sqft_clean has 48935 (13.4%) zeros | Zeros |
home_age has 29405 (8.0%) zeros | Zeros |
Reproduction
| Analysis started | 2024-05-16 13:39:20.007802 |
|---|---|
| Analysis finished | 2024-05-16 13:40:40.585670 |
| Duration | 1 minute and 20.58 seconds |
| Software version | ydata-profiling vv4.5.1 |
| Download configuration | config.json |
zipcode
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 4231 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 51624.456 |
| Minimum | 1104 |
|---|---|
| Maximum | 99338 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 1104 |
|---|---|
| 5-th percentile | 11235 |
| Q1 | 32836 |
| median | 37205 |
| Q3 | 77386 |
| 95-th percentile | 95620 |
| Maximum | 99338 |
| Range | 98234 |
| Interquartile range (IQR) | 44550 |
Descriptive statistics
| Standard deviation | 26828.688 |
|---|---|
| Coefficient of variation (CV) | 0.5196895 |
| Kurtosis | -1.3445354 |
| Mean | 51624.456 |
| Median Absolute Deviation (MAD) | 17189 |
| Skewness | 0.29467617 |
| Sum | 1.8861976 × 1010 |
| Variance | 7.1977851 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 32137 | 2104 | 0.6% |
| 33131 | 1550 | 0.4% |
| 34747 | 1474 | 0.4% |
| 78245 | 1361 | 0.4% |
| 33137 | 1298 | 0.4% |
| 33132 | 1295 | 0.4% |
| 78253 | 1252 | 0.3% |
| 34759 | 1240 | 0.3% |
| 78254 | 1211 | 0.3% |
| 33130 | 1161 | 0.3% |
| Other values (4221) | 351423 |
| Value | Count | Frequency (%) |
| 1104 | 10 | |
| 1105 | 7 | < 0.1% |
| 1106 | 1 | < 0.1% |
| 1107 | 2 | < 0.1% |
| 1108 | 13 | |
| 1109 | 18 | |
| 1118 | 8 | |
| 1119 | 7 | < 0.1% |
| 1128 | 4 | < 0.1% |
| 1129 | 7 | < 0.1% |
| Value | Count | Frequency (%) |
| 99338 | 103 | |
| 99337 | 146 | |
| 99336 | 126 | |
| 99224 | 123 | |
| 99223 | 92 | |
| 99218 | 33 | < 0.1% |
| 99217 | 80 | |
| 99216 | 24 | < 0.1% |
| 99212 | 50 | < 0.1% |
| 99208 | 190 |
target_clean
Real number (ℝ)
HIGH CORRELATION  SKEWED 
| Distinct | 34054 |
|---|---|
| Distinct (%) | 9.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 644634.16 |
| Minimum | 1 |
|---|---|
| Maximum | 1.95 × 108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 39500 |
| Q1 | 189900 |
| median | 324500 |
| Q3 | 585000 |
| 95-th percentile | 1950000 |
| Maximum | 1.95 × 108 |
| Range | 1.95 × 108 |
| Interquartile range (IQR) | 395100 |
Descriptive statistics
| Standard deviation | 1835212.7 |
|---|---|
| Coefficient of variation (CV) | 2.8469058 |
| Kurtosis | 1393.2279 |
| Mean | 644634.16 |
| Median Absolute Deviation (MAD) | 168600 |
| Skewness | 25.647721 |
| Sum | 2.3552934 × 1011 |
| Variance | 3.3680058 × 1012 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 225000 | 1775 | 0.5% |
| 350000 | 1617 | 0.4% |
| 275000 | 1610 | 0.4% |
| 250000 | 1595 | 0.4% |
| 325000 | 1534 | 0.4% |
| 399000 | 1516 | 0.4% |
| 299900 | 1507 | 0.4% |
| 249900 | 1472 | 0.4% |
| 299000 | 1431 | 0.4% |
| 450000 | 1418 | 0.4% |
| Other values (34044) | 349894 |
| Value | Count | Frequency (%) |
| 1 | 13 | |
| 3 | 2 | < 0.1% |
| 8 | 1 | < 0.1% |
| 20 | 1 | < 0.1% |
| 25 | 1 | < 0.1% |
| 29 | 1 | < 0.1% |
| 30 | 1 | < 0.1% |
| 250 | 1 | < 0.1% |
| 393 | 1 | < 0.1% |
| 400 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 195000000 | 1 | |
| 165000000 | 2 | |
| 150000000 | 1 | |
| 129000000 | 1 | |
| 115000000 | 2 | |
| 110000000 | 2 | |
| 98000000 | 1 | |
| 88000000 | 1 | |
| 87000000 | 1 | |
| 85000000 | 1 |
log_target_clean
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 34054 |
|---|---|
| Distinct (%) | 9.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.652744 |
| Minimum | 0 |
|---|---|
| Maximum | 19.08851 |
| Zeros | 13 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 10.584056 |
| Q1 | 12.154253 |
| median | 12.690041 |
| Q3 | 13.279367 |
| 95-th percentile | 14.48334 |
| Maximum | 19.08851 |
| Range | 19.08851 |
| Interquartile range (IQR) | 1.1251142 |
Descriptive statistics
| Standard deviation | 1.1925631 |
|---|---|
| Coefficient of variation (CV) | 0.094253317 |
| Kurtosis | 3.6599259 |
| Mean | 12.652744 |
| Median Absolute Deviation (MAD) | 0.56192971 |
| Skewness | -0.68997535 |
| Sum | 4622920.3 |
| Variance | 1.4222066 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 12.32385568 | 1775 | 0.5% |
| 12.76568843 | 1617 | 0.4% |
| 12.52452638 | 1610 | 0.4% |
| 12.4292162 | 1595 | 0.4% |
| 12.69158046 | 1534 | 0.4% |
| 12.8967167 | 1516 | 0.4% |
| 12.61120436 | 1507 | 0.4% |
| 12.42881612 | 1472 | 0.4% |
| 12.60819885 | 1431 | 0.4% |
| 13.01700286 | 1418 | 0.4% |
| Other values (34044) | 349894 |
| Value | Count | Frequency (%) |
| 0 | 13 | |
| 1.098612289 | 2 | < 0.1% |
| 2.079441542 | 1 | < 0.1% |
| 2.995732274 | 1 | < 0.1% |
| 3.218875825 | 1 | < 0.1% |
| 3.36729583 | 1 | < 0.1% |
| 3.401197382 | 1 | < 0.1% |
| 5.521460918 | 1 | < 0.1% |
| 5.973809612 | 1 | < 0.1% |
| 5.991464547 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 19.08851012 | 1 | |
| 18.92145603 | 2 | |
| 18.82614585 | 1 | |
| 18.67532296 | 1 | |
| 18.56044269 | 2 | |
| 18.51599092 | 2 | |
| 18.40047804 | 1 | |
| 18.29284737 | 1 | |
| 18.28141868 | 1 | |
| 18.25816181 | 1 |
private_pool_join
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 365369 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 321341 | |
| 1 | 44028 | 12.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 321341 | |
| 1 | 44028 | 12.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 321341 | |
| 1 | 44028 | 12.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 365369 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 321341 | |
| 1 | 44028 | 12.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 365369 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 321341 | |
| 1 | 44028 | 12.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 365369 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 321341 | |
| 1 | 44028 | 12.1% |
mlsid_join_bool
Categorical
IMBALANCE 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 365369 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 326793 | |
| 0 | 38576 | 10.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 326793 | |
| 0 | 38576 | 10.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 326793 | |
| 0 | 38576 | 10.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 365369 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 326793 | |
| 0 | 38576 | 10.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 365369 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 326793 | |
| 0 | 38576 | 10.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 365369 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 326793 | |
| 0 | 38576 | 10.6% |
sqft_clean
Real number (ℝ)
HIGH CORRELATION  SKEWED  ZEROS 
| Distinct | 9841 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2650.1925 |
| Minimum | 0 |
|---|---|
| Maximum | 7078574 |
| Zeros | 48935 |
| Zeros (%) | 13.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1050 |
| median | 1666 |
| Q3 | 2470 |
| 95-th percentile | 4542 |
| Maximum | 7078574 |
| Range | 7078574 |
| Interquartile range (IQR) | 1420 |
Descriptive statistics
| Standard deviation | 30587.402 |
|---|---|
| Coefficient of variation (CV) | 11.541578 |
| Kurtosis | 30142.676 |
| Mean | 2650.1925 |
| Median Absolute Deviation (MAD) | 691 |
| Skewness | 150.38446 |
| Sum | 9.6829817 × 108 |
| Variance | 9.3558917 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 48935 | 13.4% |
| 1200 | 1388 | 0.4% |
| 1000 | 996 | 0.3% |
| 1500 | 975 | 0.3% |
| 1800 | 951 | 0.3% |
| 1100 | 921 | 0.3% |
| 1400 | 877 | 0.2% |
| 2000 | 840 | 0.2% |
| 1600 | 799 | 0.2% |
| 800 | 735 | 0.2% |
| Other values (9831) | 307952 |
| Value | Count | Frequency (%) |
| 0 | 48935 | |
| 1 | 71 | < 0.1% |
| 2 | 6 | < 0.1% |
| 3 | 2 | < 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 2 | < 0.1% |
| 6 | 1 | < 0.1% |
| 10 | 2 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 7078574 | 3 | |
| 5728968 | 1 | < 0.1% |
| 4356000 | 2 | |
| 2807917 | 2 | |
| 2613600 | 1 | < 0.1% |
| 2585006 | 2 | |
| 1916640 | 1 | < 0.1% |
| 1761113 | 1 | < 0.1% |
| 1611720 | 1 | < 0.1% |
| 1598652 | 1 | < 0.1% |
log_sqft_clean
Real number (ℝ)
HIGH CORRELATION  ZEROS 
| Distinct | 9841 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.5390758 |
| Minimum | 0 |
|---|---|
| Maximum | 15.772583 |
| Zeros | 48935 |
| Zeros (%) | 13.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 6.9574974 |
| median | 7.4187809 |
| Q3 | 7.8123782 |
| 95-th percentile | 8.4213429 |
| Maximum | 15.772583 |
| Range | 15.772583 |
| Interquartile range (IQR) | 0.85488084 |
Descriptive statistics
| Standard deviation | 2.6354149 |
|---|---|
| Coefficient of variation (CV) | 0.40302559 |
| Kurtosis | 2.2051768 |
| Mean | 6.5390758 |
| Median Absolute Deviation (MAD) | 0.4211849 |
| Skewness | -1.9278333 |
| Sum | 2389175.6 |
| Variance | 6.9454118 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 48935 | 13.4% |
| 7.090909822 | 1388 | 0.4% |
| 6.908754779 | 996 | 0.3% |
| 7.313886832 | 975 | 0.3% |
| 7.496097345 | 951 | 0.3% |
| 7.003974137 | 921 | 0.3% |
| 7.244941546 | 877 | 0.2% |
| 7.601402335 | 840 | 0.2% |
| 7.378383713 | 799 | 0.2% |
| 6.685860947 | 735 | 0.2% |
| Other values (9831) | 307952 |
| Value | Count | Frequency (%) |
| 0 | 48935 | |
| 0.6931471806 | 71 | < 0.1% |
| 1.098612289 | 6 | < 0.1% |
| 1.386294361 | 2 | < 0.1% |
| 1.609437912 | 1 | < 0.1% |
| 1.791759469 | 2 | < 0.1% |
| 1.945910149 | 1 | < 0.1% |
| 2.397895273 | 2 | < 0.1% |
| 2.48490665 | 1 | < 0.1% |
| 2.564949357 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 15.77258317 | 3 | |
| 15.56104614 | 1 | < 0.1% |
| 15.28706499 | 2 | |
| 14.84795384 | 2 | |
| 14.77623952 | 1 | < 0.1% |
| 14.76523877 | 2 | |
| 14.46608473 | 1 | < 0.1% |
| 14.38145712 | 1 | < 0.1% |
| 14.29281311 | 1 | < 0.1% |
| 14.28467196 | 1 | < 0.1% |
baths_clean
Real number (ℝ)
HIGH CORRELATION  MISSING 
| Distinct | 78 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 102041 |
| Missing (%) | 27.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.723222 |
| Minimum | 0 |
|---|---|
| Maximum | 76 |
| Zeros | 3581 |
| Zeros (%) | 1.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2.5 |
| Q3 | 3 |
| 95-th percentile | 5 |
| Maximum | 76 |
| Range | 76 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.4458969 |
|---|---|
| Coefficient of variation (CV) | 0.53095078 |
| Kurtosis | 121.36787 |
| Mean | 2.723222 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 5.4607315 |
| Sum | 717100.6 |
| Variance | 2.0906177 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 101385 | |
| 3 | 65319 | |
| 4 | 25701 | 7.0% |
| 1 | 17174 | 4.7% |
| 2.5 | 13735 | 3.8% |
| 5 | 9291 | 2.5% |
| 3.5 | 6062 | 1.7% |
| 1.5 | 4253 | 1.2% |
| 6 | 4228 | 1.2% |
| 0 | 3581 | 1.0% |
| Other values (68) | 12599 | 3.4% |
| (Missing) | 102041 |
| Value | Count | Frequency (%) |
| 0 | 3581 | 1.0% |
| 0.5 | 1 | < 0.1% |
| 0.75 | 2 | < 0.1% |
| 1 | 17174 | 4.7% |
| 1.1 | 13 | < 0.1% |
| 1.25 | 1186 | 0.3% |
| 1.5 | 4253 | 1.2% |
| 1.75 | 1787 | 0.5% |
| 2 | 101385 | |
| 2.1 | 50 | < 0.1% |
| Value | Count | Frequency (%) |
| 76 | 1 | < 0.1% |
| 68 | 1 | < 0.1% |
| 64 | 1 | < 0.1% |
| 60 | 1 | < 0.1% |
| 55 | 1 | < 0.1% |
| 44 | 3 | |
| 43 | 1 | < 0.1% |
| 42 | 1 | < 0.1% |
| 41 | 1 | < 0.1% |
| 40 | 4 |
fireplace_booled
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 365369 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 266863 | |
| 1 | 98506 | 27.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 266863 | |
| 1 | 98506 | 27.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 266863 | |
| 1 | 98506 | 27.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 365369 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 266863 | |
| 1 | 98506 | 27.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 365369 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 266863 | |
| 1 | 98506 | 27.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 365369 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 266863 | |
| 1 | 98506 | 27.0% |
population
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 4014 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 866 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33388.017 |
| Minimum | 79 |
|---|---|
| Maximum | 113916 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 79 |
|---|---|
| 5-th percentile | 9384 |
| Q1 | 20507 |
| median | 31099 |
| Q3 | 43194 |
| 95-th percentile | 66551 |
| Maximum | 113916 |
| Range | 113837 |
| Interquartile range (IQR) | 22687 |
Descriptive statistics
| Standard deviation | 17679.901 |
|---|---|
| Coefficient of variation (CV) | 0.52952833 |
| Kurtosis | 1.2283742 |
| Mean | 33388.017 |
| Median Absolute Deviation (MAD) | 11232 |
| Skewness | 0.90472715 |
| Sum | 1.2170032 × 1010 |
| Variance | 3.1257889 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 37821 | 2104 | 0.6% |
| 14917 | 1550 | 0.4% |
| 13692 | 1474 | 0.4% |
| 56511 | 1361 | 0.4% |
| 19410 | 1298 | 0.4% |
| 11165 | 1295 | 0.4% |
| 29007 | 1252 | 0.3% |
| 30170 | 1240 | 0.3% |
| 44817 | 1211 | 0.3% |
| 26108 | 1161 | 0.3% |
| Other values (4004) | 350557 |
| Value | Count | Frequency (%) |
| 79 | 2 | < 0.1% |
| 105 | 1 | < 0.1% |
| 114 | 15 | |
| 153 | 18 | |
| 164 | 1 | < 0.1% |
| 169 | 1 | < 0.1% |
| 170 | 2 | < 0.1% |
| 179 | 1 | < 0.1% |
| 191 | 6 | < 0.1% |
| 197 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 113916 | 117 | |
| 111086 | 134 | |
| 109931 | 131 | |
| 105549 | 54 | < 0.1% |
| 103892 | 97 | |
| 103689 | 44 | < 0.1% |
| 101572 | 147 | |
| 100820 | 123 | |
| 99598 | 199 | |
| 98592 | 109 |
lat
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 1508 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.47784 |
| Minimum | 25.56 |
|---|---|
| Maximum | 48.79 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 25.56 |
|---|---|
| 5-th percentile | 25.82 |
| Q1 | 28.8 |
| median | 32.78 |
| Q3 | 38.78 |
| 95-th percentile | 43.04 |
| Maximum | 48.79 |
| Range | 23.23 |
| Interquartile range (IQR) | 9.98 |
Descriptive statistics
| Standard deviation | 5.9106505 |
|---|---|
| Coefficient of variation (CV) | 0.17655412 |
| Kurtosis | -0.67189639 |
| Mean | 33.47784 |
| Median Absolute Deviation (MAD) | 4.38 |
| Skewness | 0.54441271 |
| Sum | 12231765 |
| Variance | 34.93579 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 25.85 | 3088 | 0.8% |
| 25.77 | 2879 | 0.8% |
| 25.78 | 2738 | 0.7% |
| 29.8 | 2543 | 0.7% |
| 25.82 | 2542 | 0.7% |
| 29.57 | 2483 | 0.7% |
| 29.74 | 2285 | 0.6% |
| 29.47 | 2048 | 0.6% |
| 29.4 | 2030 | 0.6% |
| 29.77 | 1957 | 0.5% |
| Other values (1498) | 340776 |
| Value | Count | Frequency (%) |
| 25.56 | 281 | 0.1% |
| 25.57 | 202 | 0.1% |
| 25.6 | 580 | |
| 25.61 | 542 | |
| 25.64 | 74 | < 0.1% |
| 25.65 | 713 | |
| 25.66 | 389 | |
| 25.67 | 664 | |
| 25.68 | 567 | |
| 25.7 | 955 |
| Value | Count | Frequency (%) |
| 48.79 | 140 | |
| 48.73 | 49 | < 0.1% |
| 48.68 | 133 | |
| 48.21 | 4 | < 0.1% |
| 48.18 | 1 | < 0.1% |
| 48.09 | 70 | |
| 48.05 | 142 | |
| 47.98 | 72 | |
| 47.95 | 8 | < 0.1% |
| 47.94 | 142 |
lng
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 1955 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -90.487914 |
| Minimum | -123.4 |
|---|---|
| Maximum | -67.07 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 365369 |
| Negative (%) | 100.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | -123.4 |
|---|---|
| 5-th percentile | -121.88 |
| Q1 | -97.34 |
| median | -83.65 |
| Q3 | -80.41 |
| 95-th percentile | -73.99 |
| Maximum | -67.07 |
| Range | 56.33 |
| Interquartile range (IQR) | 16.93 |
Descriptive statistics
| Standard deviation | 13.975913 |
|---|---|
| Coefficient of variation (CV) | -0.15445061 |
| Kurtosis | -0.036098161 |
| Mean | -90.487914 |
| Median Absolute Deviation (MAD) | 5.03 |
| Skewness | -1.0305054 |
| Sum | -33061479 |
| Variance | 195.32616 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -80.18 | 5627 | 1.5% |
| -80.13 | 4111 | 1.1% |
| -80.14 | 3022 | 0.8% |
| -80.27 | 3004 | 0.8% |
| -98.73 | 2576 | 0.7% |
| -80.23 | 2497 | 0.7% |
| -80.24 | 2490 | 0.7% |
| -80.17 | 2304 | 0.6% |
| -73.95 | 2147 | 0.6% |
| -81.21 | 2109 | 0.6% |
| Other values (1945) | 335482 |
| Value | Count | Frequency (%) |
| -123.4 | 26 | < 0.1% |
| -123.22 | 17 | < 0.1% |
| -123.19 | 10 | < 0.1% |
| -123.13 | 4 | < 0.1% |
| -123.11 | 17 | < 0.1% |
| -123.08 | 27 | < 0.1% |
| -123.07 | 2 | < 0.1% |
| -123.06 | 87 | |
| -123.05 | 4 | < 0.1% |
| -123.04 | 71 |
| Value | Count | Frequency (%) |
| -67.07 | 1 | < 0.1% |
| -67.16 | 46 | |
| -69.55 | 1 | < 0.1% |
| -69.6 | 63 | |
| -69.71 | 81 | |
| -69.79 | 43 | |
| -70.22 | 2 | < 0.1% |
| -70.29 | 13 | < 0.1% |
| -70.3 | 4 | < 0.1% |
| -70.34 | 7 | < 0.1% |
state_zip
Categorical
HIGH CORRELATION 
| Distinct | 34 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| FL | |
|---|---|
| TX | |
| NY | |
| CA | |
| NC | |
| Other values (29) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 730738 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | NC |
|---|---|
| 2nd row | WA |
| 3rd row | CA |
| 4th row | TX |
| 5th row | FL |
Common Values
| Value | Count | Frequency (%) |
| FL | 113038 | |
| TX | 81563 | |
| NY | 23269 | 6.4% |
| CA | 22989 | 6.3% |
| NC | 21378 | 5.9% |
| TN | 17655 | 4.8% |
| WA | 13530 | 3.7% |
| OH | 11743 | 3.2% |
| NV | 8357 | 2.3% |
| IL | 8102 | 2.2% |
| Other values (24) | 43745 | 12.0% |
Length
| Value | Count | Frequency (%) |
| fl | 113038 | |
| tx | 81563 | |
| ny | 23269 | 6.4% |
| ca | 22989 | 6.3% |
| nc | 21378 | 5.9% |
| tn | 17655 | 4.8% |
| wa | 13530 | 3.7% |
| oh | 11743 | 3.2% |
| nv | 8357 | 2.3% |
| il | 8102 | 2.2% |
| Other values (24) | 43745 | 12.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| L | 121141 | |
| F | 113038 | |
| T | 101253 | |
| X | 81563 | |
| N | 74214 | |
| C | 54927 | |
| A | 53756 | |
| Y | 23357 | 3.2% |
| O | 21388 | 2.9% |
| I | 16878 | 2.3% |
| Other values (14) | 69223 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 730738 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| L | 121141 | |
| F | 113038 | |
| T | 101253 | |
| X | 81563 | |
| N | 74214 | |
| C | 54927 | |
| A | 53756 | |
| Y | 23357 | 3.2% |
| O | 21388 | 2.9% |
| I | 16878 | 2.3% |
| Other values (14) | 69223 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 730738 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| L | 121141 | |
| F | 113038 | |
| T | 101253 | |
| X | 81563 | |
| N | 74214 | |
| C | 54927 | |
| A | 53756 | |
| Y | 23357 | 3.2% |
| O | 21388 | 2.9% |
| I | 16878 | 2.3% |
| Other values (14) | 69223 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 730738 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| L | 121141 | |
| F | 113038 | |
| T | 101253 | |
| X | 81563 | |
| N | 74214 | |
| C | 54927 | |
| A | 53756 | |
| Y | 23357 | 3.2% |
| O | 21388 | 2.9% |
| I | 16878 | 2.3% |
| Other values (14) | 69223 |
median_household_income
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 4016 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 866 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 57331.753 |
| Minimum | 9106 |
|---|---|
| Maximum | 230952 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 9106 |
|---|---|
| 5-th percentile | 27180 |
| Q1 | 41410 |
| median | 52316 |
| Q3 | 68977 |
| 95-th percentile | 103309 |
| Maximum | 230952 |
| Range | 221846 |
| Interquartile range (IQR) | 27567 |
Descriptive statistics
| Standard deviation | 23358.446 |
|---|---|
| Coefficient of variation (CV) | 0.40742599 |
| Kurtosis | 2.362639 |
| Mean | 57331.753 |
| Median Absolute Deviation (MAD) | 13211 |
| Skewness | 1.2034324 |
| Sum | 2.0897596 × 1010 |
| Variance | 5.4561701 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 51153 | 2104 | 0.6% |
| 78094 | 1550 | 0.4% |
| 64823 | 1474 | 0.4% |
| 59785 | 1361 | 0.4% |
| 48959 | 1298 | 0.4% |
| 57843 | 1295 | 0.4% |
| 85508 | 1252 | 0.3% |
| 42163 | 1240 | 0.3% |
| 76649 | 1211 | 0.3% |
| 22813 | 1161 | 0.3% |
| Other values (4006) | 350557 |
| Value | Count | Frequency (%) |
| 9106 | 8 | < 0.1% |
| 9475 | 1 | < 0.1% |
| 9954 | 19 | < 0.1% |
| 10422 | 10 | < 0.1% |
| 12143 | 104 | |
| 12149 | 1 | < 0.1% |
| 12457 | 30 | < 0.1% |
| 12534 | 6 | < 0.1% |
| 13350 | 3 | < 0.1% |
| 13415 | 9 | < 0.1% |
| Value | Count | Frequency (%) |
| 230952 | 10 | < 0.1% |
| 216037 | 111 | |
| 214219 | 2 | < 0.1% |
| 196637 | 1 | < 0.1% |
| 192648 | 7 | < 0.1% |
| 187768 | 3 | < 0.1% |
| 183966 | 3 | < 0.1% |
| 183833 | 10 | < 0.1% |
| 180583 | 1 | < 0.1% |
| 180540 | 9 | < 0.1% |
housing_units
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 3807 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 866 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14793.575 |
| Minimum | 38 |
|---|---|
| Maximum | 47617 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 38 |
|---|---|
| 5-th percentile | 4301 |
| Q1 | 9817 |
| median | 14089 |
| Q3 | 19008 |
| 95-th percentile | 27877 |
| Maximum | 47617 |
| Range | 47579 |
| Interquartile range (IQR) | 9191 |
Descriptive statistics
| Standard deviation | 7107.8074 |
|---|---|
| Coefficient of variation (CV) | 0.48046585 |
| Kurtosis | 0.64072833 |
| Mean | 14793.575 |
| Median Absolute Deviation (MAD) | 4502 |
| Skewness | 0.6787794 |
| Sum | 5.3923024 × 109 |
| Variance | 50520927 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 20744 | 2104 | 0.6% |
| 13481 | 1550 | 0.4% |
| 20571 | 1474 | 0.4% |
| 9817 | 1424 | 0.4% |
| 18749 | 1361 | 0.4% |
| 10825 | 1298 | 0.4% |
| 7504 | 1295 | 0.4% |
| 13256 | 1247 | 0.3% |
| 15250 | 1211 | 0.3% |
| 14259 | 1161 | 0.3% |
| Other values (3797) | 350378 |
| Value | Count | Frequency (%) |
| 38 | 2 | < 0.1% |
| 59 | 1 | < 0.1% |
| 63 | 1 | < 0.1% |
| 88 | 1 | < 0.1% |
| 96 | 2 | < 0.1% |
| 107 | 1 | < 0.1% |
| 110 | 1 | < 0.1% |
| 113 | 1 | < 0.1% |
| 128 | 2 | < 0.1% |
| 130 | 6 |
| Value | Count | Frequency (%) |
| 47617 | 191 | |
| 41483 | 228 | |
| 39547 | 161 | < 0.1% |
| 39402 | 364 | |
| 38453 | 154 | < 0.1% |
| 37951 | 306 | |
| 37745 | 147 | < 0.1% |
| 37619 | 434 | |
| 37598 | 95 | < 0.1% |
| 37432 | 67 | < 0.1% |
occupied_housing_units
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 3769 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 866 |
| Missing (%) | 0.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12960.053 |
| Minimum | 33 |
|---|---|
| Maximum | 44432 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 33 |
|---|---|
| 5-th percentile | 3705 |
| Q1 | 8506 |
| median | 12354 |
| Q3 | 16556 |
| 95-th percentile | 24404 |
| Maximum | 44432 |
| Range | 44399 |
| Interquartile range (IQR) | 8050 |
Descriptive statistics
| Standard deviation | 6437.3506 |
|---|---|
| Coefficient of variation (CV) | 0.49670713 |
| Kurtosis | 0.85429206 |
| Mean | 12960.053 |
| Median Absolute Deviation (MAD) | 3997 |
| Skewness | 0.74138286 |
| Sum | 4.723978 × 109 |
| Variance | 41439482 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 16012 | 2104 | 0.6% |
| 8775 | 1574 | 0.4% |
| 5150 | 1474 | 0.4% |
| 17691 | 1361 | 0.4% |
| 5672 | 1329 | 0.4% |
| 8731 | 1298 | 0.4% |
| 9435 | 1252 | 0.3% |
| 10181 | 1240 | 0.3% |
| 14708 | 1211 | 0.3% |
| 12087 | 1161 | 0.3% |
| Other values (3759) | 350499 |
| Value | Count | Frequency (%) |
| 33 | 2 | < 0.1% |
| 53 | 1 | < 0.1% |
| 55 | 1 | < 0.1% |
| 58 | 15 | |
| 73 | 1 | < 0.1% |
| 76 | 1 | < 0.1% |
| 81 | 7 | < 0.1% |
| 85 | 2 | < 0.1% |
| 87 | 2 | < 0.1% |
| 92 | 18 |
| Value | Count | Frequency (%) |
| 44432 | 191 | |
| 37865 | 228 | |
| 36101 | 147 | |
| 35524 | 67 | < 0.1% |
| 35407 | 161 | |
| 34850 | 306 | |
| 34843 | 109 | < 0.1% |
| 34452 | 134 | < 0.1% |
| 34383 | 364 | |
| 34330 | 154 |
major_city_ch
Categorical
HIGH CORRELATION 
| Distinct | 26 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| others | |
|---|---|
| Houston | |
| Miami | |
| San Antonio | 15918 |
| Fort Lauderdale | 12060 |
| Other values (21) |
Length
| Max length | 16 |
|---|---|
| Median length | 15 |
| Mean length | 7.4823261 |
| Min length | 5 |
Characters and Unicode
| Total characters | 2733810 |
|---|---|
| Distinct characters | 40 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | others |
|---|---|
| 2nd row | others |
| 3rd row | Los Angeles |
| 4th row | Dallas |
| 5th row | others |
Common Values
| Value | Count | Frequency (%) |
| others | 167072 | |
| Houston | 24054 | 6.6% |
| Miami | 21292 | 5.8% |
| San Antonio | 15918 | 4.4% |
| Fort Lauderdale | 12060 | 3.3% |
| Jacksonville | 9612 | 2.6% |
| Dallas | 8822 | 2.4% |
| Cleveland | 7087 | 1.9% |
| Orlando | 7012 | 1.9% |
| Brooklyn | 6901 | 1.9% |
| Other values (16) | 85539 |
Length
| Value | Count | Frequency (%) |
| others | 167072 | |
| miami | 25054 | 5.9% |
| houston | 24054 | 5.7% |
| fort | 17890 | 4.2% |
| san | 15918 | 3.7% |
| antonio | 15918 | 3.7% |
| lauderdale | 12060 | 2.8% |
| jacksonville | 9612 | 2.3% |
| dallas | 8822 | 2.1% |
| cleveland | 7087 | 1.7% |
| Other values (22) | 121329 |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 329213 | |
| e | 287003 | |
| t | 283775 | |
| s | 260743 | |
| r | 240301 | |
| h | 207950 | 7.6% |
| a | 186411 | 6.8% |
| n | 130723 | 4.8% |
| i | 121921 | 4.5% |
| l | 116560 | 4.3% |
| Other values (30) | 569210 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2416619 | |
| Uppercase Letter | 257744 | 9.4% |
| Space Separator | 59447 | 2.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 329213 | |
| e | 287003 | |
| t | 283775 | |
| s | 260743 | |
| r | 240301 | |
| h | 207950 | |
| a | 186411 | |
| n | 130723 | 5.4% |
| i | 121921 | 5.0% |
| l | 116560 | 4.8% |
| Other values (11) | 252019 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 30352 | |
| S | 25904 | |
| M | 25054 | |
| H | 24054 | |
| L | 22178 | 8.6% |
| C | 19945 | 7.7% |
| F | 17890 | 6.9% |
| N | 11537 | 4.5% |
| B | 10663 | 4.1% |
| W | 10225 | 4.0% |
| Other values (8) | 59942 |
Space Separator
| Value | Count | Frequency (%) |
| 59447 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2674363 | |
| Common | 59447 | 2.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 329213 | |
| e | 287003 | |
| t | 283775 | |
| s | 260743 | |
| r | 240301 | |
| h | 207950 | |
| a | 186411 | 7.0% |
| n | 130723 | 4.9% |
| i | 121921 | 4.6% |
| l | 116560 | 4.4% |
| Other values (29) | 509763 |
Common
| Value | Count | Frequency (%) |
| 59447 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2733810 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 329213 | |
| e | 287003 | |
| t | 283775 | |
| s | 260743 | |
| r | 240301 | |
| h | 207950 | 7.6% |
| a | 186411 | 6.8% |
| n | 130723 | 4.8% |
| i | 121921 | 4.5% |
| l | 116560 | 4.3% |
| Other values (30) | 569210 |
status_cl
Categorical
IMBALANCE 
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| active | |
|---|---|
| not defined | |
| foreclosure | 9411 |
| new construction | 5352 |
| pending | 5217 |
| Other values (5) | 7214 |
Length
| Max length | 16 |
|---|---|
| Median length | 6 |
| Mean length | 6.93552 |
| Min length | 6 |
Characters and Unicode
| Total characters | 2534024 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | active |
|---|---|
| 2nd row | active |
| 3rd row | active |
| 4th row | active |
| 5th row | active |
Common Values
| Value | Count | Frequency (%) |
| active | 298185 | |
| not defined | 39990 | 10.9% |
| foreclosure | 9411 | 2.6% |
| new construction | 5352 | 1.5% |
| pending | 5217 | 1.4% |
| under contract | 3710 | 1.0% |
| auction | 1308 | 0.4% |
| contingent | 1066 | 0.3% |
| others | 722 | 0.2% |
| for rent | 408 | 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| active | 298185 | |
| not | 39990 | 9.6% |
| defined | 39990 | 9.6% |
| foreclosure | 9411 | 2.3% |
| new | 5352 | 1.3% |
| construction | 5352 | 1.3% |
| pending | 5217 | 1.3% |
| under | 3710 | 0.9% |
| contract | 3710 | 0.9% |
| auction | 1308 | 0.3% |
| Other values (4) | 2604 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 413462 | |
| t | 360869 | |
| i | 351118 | |
| c | 328094 | |
| a | 303203 | |
| v | 298185 | |
| n | 118804 | 4.7% |
| d | 88907 | 3.5% |
| o | 76730 | 3.0% |
| f | 49809 | 2.0% |
| Other values (9) | 144843 | 5.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2484564 | |
| Space Separator | 49460 | 2.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 413462 | |
| t | 360869 | |
| i | 351118 | |
| c | 328094 | |
| a | 303203 | |
| v | 298185 | |
| n | 118804 | 4.8% |
| d | 88907 | 3.6% |
| o | 76730 | 3.1% |
| f | 49809 | 2.0% |
| Other values (8) | 95383 | 3.8% |
Space Separator
| Value | Count | Frequency (%) |
| 49460 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2484564 | |
| Common | 49460 | 2.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 413462 | |
| t | 360869 | |
| i | 351118 | |
| c | 328094 | |
| a | 303203 | |
| v | 298185 | |
| n | 118804 | 4.8% |
| d | 88907 | 3.6% |
| o | 76730 | 3.1% |
| f | 49809 | 2.0% |
| Other values (8) | 95383 | 3.8% |
Common
| Value | Count | Frequency (%) |
| 49460 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2534024 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 413462 | |
| t | 360869 | |
| i | 351118 | |
| c | 328094 | |
| a | 303203 | |
| v | 298185 | |
| n | 118804 | 4.7% |
| d | 88907 | 3.5% |
| o | 76730 | 3.0% |
| f | 49809 | 2.0% |
| Other values (9) | 144843 | 5.7% |
property_type_fin
Categorical
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| single-family | |
|---|---|
| condo | |
| na | |
| land | |
| townhouse | 17812 |
| Other values (6) |
Length
| Max length | 13 |
|---|---|
| Median length | 13 |
| Mean length | 9.420063 |
| Min length | 2 |
Characters and Unicode
| Total characters | 3441799 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | single-family |
|---|---|
| 2nd row | single-family |
| 3rd row | single-family |
| 4th row | single-family |
| 5th row | land |
Common Values
| Value | Count | Frequency (%) |
| single-family | 192922 | |
| condo | 50155 | 13.7% |
| na | 33361 | 9.1% |
| land | 29397 | 8.0% |
| townhouse | 17812 | 4.9% |
| others | 14603 | 4.0% |
| multi-family | 11332 | 3.1% |
| traditional | 6036 | 1.7% |
| coop | 3792 | 1.0% |
| mobile | 3459 | 0.9% |
Length
| Value | Count | Frequency (%) |
| single-family | 192922 | |
| condo | 50155 | 13.7% |
| na | 33361 | 9.1% |
| land | 29397 | 8.0% |
| townhouse | 17812 | 4.9% |
| others | 14603 | 4.0% |
| multi-family | 11332 | 3.1% |
| traditional | 6036 | 1.7% |
| coop | 3792 | 1.0% |
| mobile | 3459 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 447400 | |
| i | 424039 | |
| n | 332183 | |
| a | 281584 | |
| e | 228796 | 6.6% |
| s | 225337 | 6.5% |
| m | 219045 | 6.4% |
| y | 204254 | 5.9% |
| f | 204254 | 5.9% |
| - | 204254 | 5.9% |
| Other values (11) | 670653 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 3237545 | |
| Dash Punctuation | 204254 | 5.9% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| l | 447400 | |
| i | 424039 | |
| n | 332183 | |
| a | 281584 | |
| e | 228796 | |
| s | 225337 | |
| m | 219045 | |
| y | 204254 | |
| f | 204254 | |
| g | 192922 | |
| Other values (10) | 477731 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 204254 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 3237545 | |
| Common | 204254 | 5.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| l | 447400 | |
| i | 424039 | |
| n | 332183 | |
| a | 281584 | |
| e | 228796 | |
| s | 225337 | |
| m | 219045 | |
| y | 204254 | |
| f | 204254 | |
| g | 192922 | |
| Other values (10) | 477731 |
Common
| Value | Count | Frequency (%) |
| - | 204254 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3441799 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| l | 447400 | |
| i | 424039 | |
| n | 332183 | |
| a | 281584 | |
| e | 228796 | 6.6% |
| s | 225337 | 6.5% |
| m | 219045 | 6.4% |
| y | 204254 | 5.9% |
| f | 204254 | 5.9% |
| - | 204254 | 5.9% |
| Other values (11) | 670653 |
home_age
Real number (ℝ)
ZEROS 
| Distinct | 123 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 37.558143 |
| Minimum | 0 |
|---|---|
| Maximum | 122 |
| Zeros | 29405 |
| Zeros (%) | 8.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 11 |
| median | 36 |
| Q3 | 58 |
| 95-th percentile | 99 |
| Maximum | 122 |
| Range | 122 |
| Interquartile range (IQR) | 47 |
Descriptive statistics
| Standard deviation | 31.084398 |
|---|---|
| Coefficient of variation (CV) | 0.82763407 |
| Kurtosis | -0.26544588 |
| Mean | 37.558143 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 0.73884158 |
| Sum | 13722581 |
| Variance | 966.23983 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 36 | 37467 | 10.3% |
| 3 | 30899 | 8.5% |
| 0 | 29405 | 8.0% |
| 16 | 7908 | 2.2% |
| 17 | 7410 | 2.0% |
| 15 | 7026 | 1.9% |
| 4 | 6669 | 1.8% |
| 18 | 5449 | 1.5% |
| 5 | 5002 | 1.4% |
| 6 | 4966 | 1.4% |
| Other values (113) | 223168 |
| Value | Count | Frequency (%) |
| 0 | 29405 | |
| 1 | 55 | < 0.1% |
| 2 | 2261 | 0.6% |
| 3 | 30899 | |
| 4 | 6669 | 1.8% |
| 5 | 5002 | 1.4% |
| 6 | 4966 | 1.4% |
| 7 | 3802 | 1.0% |
| 8 | 3044 | 0.8% |
| 9 | 2290 | 0.6% |
| Value | Count | Frequency (%) |
| 122 | 2570 | |
| 121 | 536 | 0.1% |
| 120 | 140 | < 0.1% |
| 119 | 169 | < 0.1% |
| 118 | 196 | 0.1% |
| 117 | 686 | 0.2% |
| 116 | 330 | 0.1% |
| 115 | 315 | 0.1% |
| 114 | 420 | 0.1% |
| 113 | 361 | 0.1% |
remodeled_year_bool
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 365369 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 217166 | |
| 1 | 148203 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 217166 | |
| 1 | 148203 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 217166 | |
| 1 | 148203 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 365369 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 217166 | |
| 1 | 148203 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 365369 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 217166 | |
| 1 | 148203 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 365369 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 217166 | |
| 1 | 148203 |
heating_system
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.6 MiB |
| AIR | |
|---|---|
| NO DATA | |
| OTHER | |
| NO HEATING NEED | |
| ELECTRIC | |
| Other values (2) |
Length
| Max length | 15 |
|---|---|
| Median length | 8 |
| Mean length | 5.6425833 |
| Min length | 3 |
Characters and Unicode
| Total characters | 2061625 |
|---|---|
| Distinct characters | 16 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | OTHER |
|---|---|
| 2nd row | NO DATA |
| 3rd row | AIR |
| 4th row | AIR |
| 5th row | NO HEATING NEED |
Common Values
| Value | Count | Frequency (%) |
| AIR | 145200 | |
| NO DATA | 89853 | |
| OTHER | 60207 | |
| NO HEATING NEED | 29397 | 8.0% |
| ELECTRIC | 24730 | 6.8% |
| GAS | 11343 | 3.1% |
| MULTI | 4639 | 1.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| air | 145200 | |
| no | 119250 | |
| data | 89853 | |
| other | 60207 | |
| heating | 29397 | 5.7% |
| need | 29397 | 5.7% |
| electric | 24730 | 4.8% |
| gas | 11343 | 2.2% |
| multi | 4639 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 365646 | |
| R | 230137 | |
| T | 208826 | |
| I | 203966 | |
| E | 197858 | |
| O | 179457 | |
| N | 178044 | |
| 148647 | ||
| D | 119250 | 5.8% |
| H | 89604 | 4.3% |
| Other values (6) | 140190 | 6.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 1912978 | |
| Space Separator | 148647 | 7.2% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 365646 | |
| R | 230137 | |
| T | 208826 | |
| I | 203966 | |
| E | 197858 | |
| O | 179457 | |
| N | 178044 | |
| D | 119250 | 6.2% |
| H | 89604 | 4.7% |
| C | 49460 | 2.6% |
| Other values (5) | 90730 | 4.7% |
Space Separator
| Value | Count | Frequency (%) |
| 148647 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1912978 | |
| Common | 148647 | 7.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 365646 | |
| R | 230137 | |
| T | 208826 | |
| I | 203966 | |
| E | 197858 | |
| O | 179457 | |
| N | 178044 | |
| D | 119250 | 6.2% |
| H | 89604 | 4.7% |
| C | 49460 | 2.6% |
| Other values (5) | 90730 | 4.7% |
Common
| Value | Count | Frequency (%) |
| 148647 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2061625 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 365646 | |
| R | 230137 | |
| T | 208826 | |
| I | 203966 | |
| E | 197858 | |
| O | 179457 | |
| N | 178044 | |
| 148647 | ||
| D | 119250 | 5.8% |
| H | 89604 | 4.3% |
| Other values (6) | 140190 | 6.8% |
mean_rating
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 231 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.9413036 |
| Minimum | 1 |
|---|---|
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3.5 |
| median | 5 |
| Q3 | 6.3333333 |
| 95-th percentile | 8 |
| Maximum | 9 |
| Range | 8 |
| Interquartile range (IQR) | 2.8333333 |
Descriptive statistics
| Standard deviation | 1.8305522 |
|---|---|
| Coefficient of variation (CV) | 0.37045936 |
| Kurtosis | -0.67126517 |
| Mean | 4.9413036 |
| Median Absolute Deviation (MAD) | 1.3333333 |
| Skewness | 0.19531527 |
| Sum | 1805399.1 |
| Variance | 3.3509212 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5 | 23331 | 6.4% |
| 4 | 21630 | 5.9% |
| 6 | 20119 | 5.5% |
| 3 | 19008 | 5.2% |
| 6.333333333 | 14849 | 4.1% |
| 4.666666667 | 14700 | 4.0% |
| 3.333333333 | 14687 | 4.0% |
| 7 | 13872 | 3.8% |
| 5.666666667 | 13675 | 3.7% |
| 2 | 13314 | 3.6% |
| Other values (221) | 196184 |
| Value | Count | Frequency (%) |
| 1 | 2068 | |
| 1.2 | 1 | < 0.1% |
| 1.25 | 17 | < 0.1% |
| 1.333333333 | 965 | |
| 1.4 | 10 | < 0.1% |
| 1.444444444 | 1 | < 0.1% |
| 1.5 | 1821 | |
| 1.555555556 | 4 | < 0.1% |
| 1.6 | 60 | < 0.1% |
| 1.625 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 7059 | |
| 8.833333333 | 146 | < 0.1% |
| 8.8 | 8 | < 0.1% |
| 8.75 | 86 | < 0.1% |
| 8.666666667 | 2336 | 0.6% |
| 8.6 | 169 | < 0.1% |
| 8.5 | 2980 | |
| 8.4 | 275 | 0.1% |
| 8.333333333 | 2788 | 0.8% |
| 8.25 | 970 | 0.3% |
min_dist_school
Real number (ℝ)
| Distinct | 1529 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1990203 |
| Minimum | 0 |
|---|---|
| Maximum | 48.74 |
| Zeros | 747 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.13 |
| Q1 | 0.37 |
| median | 0.7 |
| Q3 | 1.3 |
| 95-th percentile | 3.65 |
| Maximum | 48.74 |
| Range | 48.74 |
| Interquartile range (IQR) | 0.93 |
Descriptive statistics
| Standard deviation | 2.0167967 |
|---|---|
| Coefficient of variation (CV) | 1.6820372 |
| Kurtosis | 117.34197 |
| Mean | 1.1990203 |
| Median Absolute Deviation (MAD) | 0.4 |
| Skewness | 8.670267 |
| Sum | 438084.84 |
| Variance | 4.0674691 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.3 | 21717 | 5.9% |
| 0.4 | 20311 | 5.6% |
| 0.2 | 19527 | 5.3% |
| 0.5 | 18232 | 5.0% |
| 0.6 | 15616 | 4.3% |
| 0.7 | 13408 | 3.7% |
| 0.1 | 12080 | 3.3% |
| 0.8 | 10758 | 2.9% |
| 0.9 | 9362 | 2.6% |
| 1.1 | 7333 | 2.0% |
| Other values (1519) | 217025 |
| Value | Count | Frequency (%) |
| 0 | 747 | |
| 0.01 | 1 | < 0.1% |
| 0.02 | 21 | < 0.1% |
| 0.03 | 111 | < 0.1% |
| 0.04 | 178 | < 0.1% |
| 0.05 | 317 | |
| 0.06 | 380 | |
| 0.07 | 485 | |
| 0.08 | 520 | |
| 0.09 | 654 |
| Value | Count | Frequency (%) |
| 48.74 | 1 | < 0.1% |
| 45.13 | 3 | < 0.1% |
| 40.65 | 1 | < 0.1% |
| 40.3 | 1 | < 0.1% |
| 39.69 | 180 | |
| 39.35 | 1 | < 0.1% |
| 37.86 | 1 | < 0.1% |
| 36.3 | 1 | < 0.1% |
| 33.7 | 1 | < 0.1% |
| 33.6 | 2 | < 0.1% |
| zipcode | target_clean | log_target_clean | sqft_clean | log_sqft_clean | baths_clean | population | lat | lng | median_household_income | housing_units | occupied_housing_units | home_age | mean_rating | min_dist_school | private_pool_join | mlsid_join_bool | fireplace_booled | state_zip | major_city_ch | status_cl | property_type_fin | remodeled_year_bool | heating_system | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| zipcode | 1.000 | 0.012 | 0.012 | 0.130 | 0.130 | 0.030 | -0.036 | 0.122 | -0.936 | 0.111 | -0.148 | -0.101 | -0.064 | 0.074 | -0.034 | 0.142 | 0.084 | 0.276 | 0.917 | 0.661 | 0.121 | 0.178 | 0.277 | 0.180 |
| target_clean | 0.012 | 1.000 | 1.000 | 0.506 | 0.506 | 0.531 | 0.000 | 0.080 | -0.003 | 0.436 | 0.049 | 0.064 | -0.016 | 0.313 | -0.104 | 0.035 | 0.001 | 0.009 | 0.022 | 0.045 | 0.000 | 0.010 | 0.007 | 0.010 |
| log_target_clean | 0.012 | 1.000 | 1.000 | 0.506 | 0.506 | 0.531 | 0.000 | 0.080 | -0.003 | 0.436 | 0.049 | 0.064 | -0.016 | 0.313 | -0.104 | 0.159 | 0.162 | 0.160 | 0.195 | 0.165 | 0.137 | 0.171 | 0.164 | 0.189 |
| sqft_clean | 0.130 | 0.506 | 0.506 | 1.000 | 1.000 | 0.713 | 0.025 | 0.004 | -0.137 | 0.276 | -0.007 | 0.016 | 0.001 | 0.221 | 0.031 | 0.000 | 0.003 | 0.003 | 0.024 | 0.000 | 0.000 | 0.015 | 0.007 | 0.007 |
| log_sqft_clean | 0.130 | 0.506 | 0.506 | 1.000 | 1.000 | 0.713 | 0.025 | 0.004 | -0.137 | 0.276 | -0.007 | 0.016 | 0.001 | 0.221 | 0.031 | 0.187 | 0.108 | 0.317 | 0.092 | 0.104 | 0.040 | 0.295 | 0.291 | 0.280 |
| baths_clean | 0.030 | 0.531 | 0.531 | 0.713 | 0.713 | 1.000 | 0.003 | 0.032 | -0.051 | 0.287 | -0.003 | 0.008 | -0.223 | 0.243 | 0.061 | 0.103 | 0.026 | 0.036 | 0.026 | 0.041 | 0.007 | 0.049 | 0.026 | 0.011 |
| population | -0.036 | 0.000 | 0.000 | 0.025 | 0.025 | 0.003 | 1.000 | 0.137 | 0.029 | -0.104 | 0.885 | 0.939 | 0.127 | -0.020 | -0.267 | 0.054 | 0.091 | 0.087 | 0.206 | 0.209 | 0.036 | 0.090 | 0.099 | 0.083 |
| lat | 0.122 | 0.080 | 0.080 | 0.004 | 0.004 | 0.032 | 0.137 | 1.000 | -0.191 | 0.080 | 0.093 | 0.161 | 0.179 | 0.005 | -0.276 | 0.170 | 0.080 | 0.322 | 0.761 | 0.620 | 0.125 | 0.152 | 0.292 | 0.179 |
| lng | -0.936 | -0.003 | -0.003 | -0.137 | -0.137 | -0.051 | 0.029 | -0.191 | 1.000 | -0.116 | 0.150 | 0.092 | 0.091 | -0.092 | 0.027 | 0.124 | 0.081 | 0.260 | 0.828 | 0.537 | 0.098 | 0.156 | 0.227 | 0.161 |
| median_household_income | 0.111 | 0.436 | 0.436 | 0.276 | 0.276 | 0.287 | -0.104 | 0.080 | -0.116 | 1.000 | -0.106 | -0.062 | -0.167 | 0.597 | 0.065 | 0.123 | 0.060 | 0.157 | 0.162 | 0.189 | 0.040 | 0.072 | 0.091 | 0.073 |
| housing_units | -0.148 | 0.049 | 0.049 | -0.007 | -0.007 | -0.003 | 0.885 | 0.093 | 0.150 | -0.106 | 1.000 | 0.965 | 0.147 | -0.026 | -0.218 | 0.041 | 0.049 | 0.071 | 0.207 | 0.255 | 0.029 | 0.101 | 0.125 | 0.085 |
| occupied_housing_units | -0.101 | 0.064 | 0.064 | 0.016 | 0.016 | 0.008 | 0.939 | 0.161 | 0.092 | -0.062 | 0.965 | 1.000 | 0.155 | 0.009 | -0.261 | 0.051 | 0.063 | 0.088 | 0.207 | 0.237 | 0.036 | 0.110 | 0.124 | 0.100 |
| home_age | -0.064 | -0.016 | -0.016 | 0.001 | 0.001 | -0.223 | 0.127 | 0.179 | 0.091 | -0.167 | 0.147 | 0.155 | 1.000 | -0.167 | -0.285 | 0.183 | 0.168 | 0.168 | 0.173 | 0.166 | 0.075 | 0.233 | 0.408 | 0.257 |
| mean_rating | 0.074 | 0.313 | 0.313 | 0.221 | 0.221 | 0.243 | -0.020 | 0.005 | -0.092 | 0.597 | -0.026 | 0.009 | -0.167 | 1.000 | 0.096 | 0.109 | 0.068 | 0.128 | 0.182 | 0.205 | 0.050 | 0.069 | 0.084 | 0.062 |
| min_dist_school | -0.034 | -0.104 | -0.104 | 0.031 | 0.031 | 0.061 | -0.267 | -0.276 | 0.027 | 0.065 | -0.218 | -0.261 | -0.285 | 0.096 | 1.000 | 0.025 | 0.025 | 0.039 | 0.042 | 0.058 | 0.024 | 0.036 | 0.062 | 0.043 |
| private_pool_join | 0.142 | 0.035 | 0.159 | 0.000 | 0.187 | 0.103 | 0.054 | 0.170 | 0.124 | 0.123 | 0.041 | 0.051 | 0.183 | 0.109 | 0.025 | 1.000 | 0.036 | 0.061 | 0.212 | 0.169 | 0.095 | 0.233 | 0.188 | 0.203 |
| mlsid_join_bool | 0.084 | 0.001 | 0.162 | 0.003 | 0.108 | 0.026 | 0.091 | 0.080 | 0.081 | 0.060 | 0.049 | 0.063 | 0.168 | 0.068 | 0.025 | 0.036 | 1.000 | 0.106 | 0.136 | 0.168 | 0.446 | 0.220 | 0.088 | 0.334 |
| fireplace_booled | 0.276 | 0.009 | 0.160 | 0.003 | 0.317 | 0.036 | 0.087 | 0.322 | 0.260 | 0.157 | 0.071 | 0.088 | 0.168 | 0.128 | 0.039 | 0.061 | 0.106 | 1.000 | 0.347 | 0.277 | 0.102 | 0.297 | 0.120 | 0.260 |
| state_zip | 0.917 | 0.022 | 0.195 | 0.024 | 0.092 | 0.026 | 0.206 | 0.761 | 0.828 | 0.162 | 0.207 | 0.207 | 0.173 | 0.182 | 0.042 | 0.212 | 0.136 | 0.347 | 1.000 | 0.533 | 0.155 | 0.205 | 0.352 | 0.241 |
| major_city_ch | 0.661 | 0.045 | 0.165 | 0.000 | 0.104 | 0.041 | 0.209 | 0.620 | 0.537 | 0.189 | 0.255 | 0.237 | 0.166 | 0.205 | 0.058 | 0.169 | 0.168 | 0.277 | 0.533 | 1.000 | 0.180 | 0.232 | 0.429 | 0.268 |
| status_cl | 0.121 | 0.000 | 0.137 | 0.000 | 0.040 | 0.007 | 0.036 | 0.125 | 0.098 | 0.040 | 0.029 | 0.036 | 0.075 | 0.050 | 0.024 | 0.095 | 0.446 | 0.102 | 0.155 | 0.180 | 1.000 | 0.084 | 0.201 | 0.114 |
| property_type_fin | 0.178 | 0.010 | 0.171 | 0.015 | 0.295 | 0.049 | 0.090 | 0.152 | 0.156 | 0.072 | 0.101 | 0.110 | 0.233 | 0.069 | 0.036 | 0.233 | 0.220 | 0.297 | 0.205 | 0.232 | 0.084 | 1.000 | 0.261 | 0.458 |
| remodeled_year_bool | 0.277 | 0.007 | 0.164 | 0.007 | 0.291 | 0.026 | 0.099 | 0.292 | 0.227 | 0.091 | 0.125 | 0.124 | 0.408 | 0.084 | 0.062 | 0.188 | 0.088 | 0.120 | 0.352 | 0.429 | 0.201 | 0.261 | 1.000 | 0.390 |
| heating_system | 0.180 | 0.010 | 0.189 | 0.007 | 0.280 | 0.011 | 0.083 | 0.179 | 0.161 | 0.073 | 0.085 | 0.100 | 0.257 | 0.062 | 0.043 | 0.203 | 0.334 | 0.260 | 0.241 | 0.268 | 0.114 | 0.458 | 0.390 | 1.000 |
| zipcode | target_clean | log_target_clean | private_pool_join | mlsid_join_bool | sqft_clean | log_sqft_clean | baths_clean | fireplace_booled | population | lat | lng | state_zip | median_household_income | housing_units | occupied_housing_units | major_city_ch | status_cl | property_type_fin | home_age | remodeled_year_bool | heating_system | mean_rating | min_dist_school | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 28387 | 418000.0 | 12.943237 | 0 | 1 | 2900 | 7.972811 | 3.5 | 1 | 13921.0 | 35.18 | -79.40 | NC | 47982.0 | 7608.0 | 6612.0 | others | active | single-family | 3.0 | 0 | OTHER | 5.200000 | 2.700000 |
| 1 | 99216 | 310000.0 | 12.644328 | 0 | 1 | 1947 | 7.574558 | 3.0 | 0 | 24362.0 | 47.69 | -117.19 | WA | 45098.0 | 10906.0 | 10144.0 | others | active | single-family | 3.0 | 0 | NO DATA | 4.000000 | 1.010000 |
| 2 | 90049 | 2895000.0 | 14.878496 | 1 | 1 | 3000 | 8.006701 | 2.0 | 1 | 35482.0 | 34.08 | -118.49 | CA | 110854.0 | 18097.0 | 16657.0 | Los Angeles | active | single-family | 61.0 | 1 | AIR | 6.666667 | 1.190000 |
| 3 | 75205 | 2395000.0 | 14.688894 | 0 | 1 | 6457 | 8.773075 | 8.0 | 1 | 23061.0 | 32.84 | -96.80 | TX | 108913.0 | 9985.0 | 9016.0 | Dallas | active | single-family | 16.0 | 1 | AIR | 9.000000 | 0.100000 |
| 4 | 32908 | 5000.0 | 8.517193 | 0 | 1 | 0 | 0.000000 | NaN | 0 | 10892.0 | 27.96 | -80.70 | FL | 46466.0 | 4230.0 | 3657.0 | others | active | land | 0.0 | 0 | NO HEATING NEED | 4.666667 | 3.030000 |
| 5 | 19145 | 209000.0 | 12.250090 | 0 | 1 | 897 | 6.800170 | NaN | 0 | 47261.0 | 39.91 | -75.20 | PA | 35761.0 | 20874.0 | 18802.0 | Philadelphia | active | townhouse | 102.0 | 0 | AIR | 4.938800 | 1.232491 |
| 6 | 34759 | 181500.0 | 12.109011 | 0 | 1 | 1507 | 7.318540 | NaN | 0 | 30170.0 | 28.08 | -81.44 | FL | 42163.0 | 13256.0 | 10181.0 | Kissimmee | active | others | 16.0 | 1 | ELECTRIC | 2.333333 | 0.800000 |
| 7 | 38115 | 68000.0 | 11.127263 | 0 | 1 | 0 | 0.000000 | NaN | 0 | 39129.0 | 35.05 | -89.86 | TN | 29230.0 | 18726.0 | 15409.0 | others | active | na | 46.0 | 0 | NO DATA | 2.666667 | 0.400000 |
| 8 | 50401 | 244900.0 | 12.408605 | 0 | 1 | 3588 | 8.185629 | 2.0 | 0 | 29837.0 | 43.15 | -93.19 | IA | 42458.0 | 14147.0 | 13103.0 | others | active | single-family | 52.0 | 0 | AIR | 3.800000 | 5.600000 |
| 9 | 77080 | 311995.0 | 12.650742 | 0 | 1 | 1930 | 7.565793 | 3.0 | 0 | 45275.0 | 29.82 | -95.52 | TX | 38159.0 | 16376.0 | 14741.0 | Houston | not defined | single-family | 3.0 | 0 | GAS | 3.000000 | 0.600000 |
| zipcode | target_clean | log_target_clean | private_pool_join | mlsid_join_bool | sqft_clean | log_sqft_clean | baths_clean | fireplace_booled | population | lat | lng | state_zip | median_household_income | housing_units | occupied_housing_units | major_city_ch | status_cl | property_type_fin | home_age | remodeled_year_bool | heating_system | mean_rating | min_dist_school | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 374400 | 78212 | 799500.0 | 13.591742 | 1 | 1 | 950 | 6.857514 | NaN | 0 | 28415.0 | 29.45 | -98.50 | TX | 36613.0 | 13137.0 | 11544.0 | San Antonio | active | single-family | 84.0 | 1 | OTHER | 4.000000 | 0.25 |
| 374401 | 77080 | 280000.0 | 12.542545 | 0 | 1 | 1792 | 7.491645 | 3.0 | 1 | 45275.0 | 29.82 | -95.52 | TX | 38159.0 | 16376.0 | 14741.0 | Houston | active | single-family | 52.0 | 1 | OTHER | 2.666667 | 0.19 |
| 374402 | 32805 | 171306.0 | 12.051207 | 0 | 0 | 1829 | 7.512071 | 2.0 | 1 | 21810.0 | 28.53 | -81.40 | FL | 23950.0 | 9765.0 | 8278.0 | Orlando | not defined | single-family | 60.0 | 1 | AIR | 2.333333 | 1.10 |
| 374403 | 76110 | 199900.0 | 12.205573 | 0 | 1 | 1895 | 7.547502 | NaN | 0 | 30434.0 | 32.71 | -97.34 | TX | 39600.0 | 11027.0 | 9764.0 | Fort Worth | active | single-family | 101.0 | 0 | NO DATA | 5.000000 | 0.50 |
| 374404 | 77089 | 252990.0 | 12.441105 | 0 | 0 | 1841 | 7.518607 | 2.0 | 0 | 48685.0 | 29.59 | -95.23 | TX | 63141.0 | 16805.0 | 15902.0 | Houston | not defined | single-family | 3.0 | 0 | NO DATA | 6.000000 | 0.30 |
| 374405 | 20001 | 799000.0 | 13.591116 | 0 | 1 | 1417 | 7.257003 | 3.0 | 0 | 38551.0 | 38.91 | -77.02 | DC | 78848.0 | 18751.0 | 16500.0 | Washington | active | condo | 12.0 | 0 | AIR | 3.000000 | 0.10 |
| 374406 | 33180 | 1249000.0 | 14.037854 | 1 | 1 | 4017 | 8.298540 | 6.0 | 0 | 30840.0 | 25.96 | -80.14 | FL | 68317.0 | 20316.0 | 14197.0 | Miami | not defined | single-family | 32.0 | 1 | OTHER | 5.000000 | 1.10 |
| 374407 | 60657 | 674999.0 | 13.422466 | 0 | 1 | 2000 | 7.601402 | 3.0 | 0 | 65996.0 | 41.94 | -87.65 | IL | 75885.0 | 41483.0 | 37865.0 | Chicago | active | condo | 98.0 | 0 | OTHER | 4.333333 | 0.40 |
| 374408 | 11434 | 528000.0 | 13.176852 | 0 | 0 | 1152 | 7.050123 | 3.0 | 0 | 59129.0 | 40.68 | -73.78 | NY | 59229.0 | 21681.0 | 20244.0 | others | active | single-family | 72.0 | 1 | OTHER | 4.500000 | 0.48 |
| 374409 | 78218 | 204900.0 | 12.230277 | 0 | 1 | 1462 | 7.288244 | 2.0 | 0 | 31917.0 | 29.49 | -98.40 | TX | 38812.0 | 13900.0 | 12367.0 | San Antonio | not defined | single-family | 3.0 | 0 | ELECTRIC | 4.000000 | 0.30 |